Introduction

FTICR Analysis

  • Fourier-transform ion cyclotron resonance (FTICR) mass spectrometry is a type of mass spectrometery for determining the mass-to-charge ratio (m/z) of ions based on the cyclotron frequency of the ions in a fixed magnetic field.

  • Chemical composition can be determined for a portion of the observed peaks/mass-to-charge ratios.

  • Data resulting from FTICR instrument runs can be interpreted as the area under a peak for each observed peak. These are comparable in magnitude within a sample but are not comparable across samples.

fticRanalysis

The fticRanalysis package was designed to help with various steps of processing FTICR data, including:

  • data formatting and manipulation
  • reproducible analysis pipeline
  • filtering data based on various properties
  • calculating meta information for each peak (e.g. NOSC)
  • data visualization and summary
    • one sample
    • multiple samples
    • group comparisons

The Experiment

  • Goal: assess differences in soil organic matter between multiple locations and crop types
  • Measurement: 12T FTICR mass spectrometer
  • Subset of full experiment data
  • Locations: M, W
  • Crop Flora: S, C

Data loading

Experimental data

We have found that most ’omics data generated by EMSL can be divided into three parts:

  • Expression Data - observed data for each biomolecule (rows) and sample (columns)
    • e.g. area under peaks for mass spec data
    • e.g. spectral counts for each protein
  • Sample Data - data capturing relevant experimental factors (columns) for each sample (rows)
    • e.g. samples and their sampling locations, treatment applied, etc.
  • Biomolecule Data - other characteristics or quantified values (columns) for each biomolecule (rows)
    • e.g. peptide to protein mapping

Edata (Expression Data)

The edata object is a data frame with one row per peak and one column per sample. It must have one column that is a unique ID (e.g. Mass).

library(fticRanalysis)
data("fticr12T_edata")
str(fticr12T_edata)
## 'data.frame':    24442 obs. of  21 variables:
##  $ Mass         : num  98.5 98.8 98.8 101.7 103.3 ...
##  $ EM0011_sample: num  0 0 5524739 0 0 ...
##  $ EM0013_sample: num  0 13070372 0 0 0 ...
##  $ EM0015_sample: num  0.0 0.0 2.4e+07 0.0 0.0 ...
##  $ EM0017_sample: num  0 16120890 0 0 0 ...
##  $ EM0019_sample: num  0 21228496 0 0 0 ...
##  $ EM0061_sample: num  1197974 0 30656158 0 0 ...
##  $ EM0063_sample: num  0 12305626 0 0 0 ...
##  $ EM0065_sample: num  0.0 1.1e+07 0.0 0.0 0.0 ...
##  $ EM0067_sample: num  0 0 12664590 0 0 ...
##  $ EM0069_sample: num  2535836 38329628 0 0 0 ...
##  $ EW0111_sample: num  0 0 21416774 0 0 ...
##  $ EW0113_sample: num  0 8070914 0 0 0 ...
##  $ EW0115_sample: num  3636046 0 38608164 0 0 ...
##  $ EW0117_sample: num  0 3965230 0 0 0 ...
##  $ EW0119_sample: num  0 0 2439325 0 1153547 ...
##  $ EW0161_sample: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EW0163_sample: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EW0165_sample: num  0 0 0 16443347 0 ...
##  $ EW0167_sample: num  0 1598118 0 0 0 ...
##  $ EW0169_sample: num  0 0 0 0 0 0 0 0 0 0 ...

Fdata (Sample Data)

The fdata object is a data frame with one row per sample with information about experimental conditions. It must have a column that matches the sample column names in edata.

data("fticr12T_fdata")
str(fticr12T_fdata)
## 'data.frame':    20 obs. of  4 variables:
##  $ SampleID  : Factor w/ 20 levels "EM0011_sample",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Location  : Factor w/ 2 levels "M","W": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Block     : int  1 2 3 4 5 1 2 3 4 5 ...
##  $ Crop.Flora: Factor w/ 2 levels "C","S": 2 2 2 2 2 1 1 1 1 1 ...

Emeta (Biomolecule Data)

The emeta object is a data frame with one row per peak and columns containing other meta data such as molecular formula or elemental columns. It must have an ID column corresponding to the ID column in edata.

data("fticr12T_emeta")
str(fticr12T_emeta)
## 'data.frame':    24442 obs. of  10 variables:
##  $ Mass       : num  98.5 98.8 98.8 101.7 103.3 ...
##  $ C          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ H          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ O          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ N          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ C13        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ S          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ P          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Error      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ NeutralMass: num  99.5 99.8 99.8 102.7 104.4 ...

Constructing a peakIcrData object

picr <- as.peakIcrData(fticr12T_edata, fticr12T_fdata, fticr12T_emeta, 
                       edata_cname="Mass", fdata_cname="SampleID", 
                       mass_cname="Mass", c_cname="C", h_cname="H", 
                       o_cname="O", n_cname="N", s_cname="S", 
                       p_cname="P", isotopic_cname = "C13", 
                       isotopic_notation = "1")
picr
## peakIcrData object
## # Peaks: 23060
## # Samples: 20
## Meta data columns: [Mass, C, H, O, N, C13, S, P, Error, NeutralMass, MolForm]

Other Parameters

  • data_scale - assumed to be ‘abundance’ or area under the peak. Other options include: log2, log10, log, presence/absence (0/1)
  • instrument_type - assumed to be 12T/15T; vizualization methods for 21T coming
  • extraction_cname - name of column in f_data specifying extraction (e.g. water)
  • must give elemental count columns or molecular formula
  • isotopic column is optional - isotopes are currently filtered out of the data
  • data_scale - assumed to be ‘abundance’ or area under the peak. Other options include: log2, log10, log, presence/absence (0/1)
  • instrument_type - assumed to be 12T/15T; vizualization methods for 21T coming
  • extraction_cname - name of column in f_data specifying extraction (e.g. water)

peakIcrDataObject

The peakIcrData object contains three elements, named e_data, f_data, and e_meta:

names(picr)
## [1] "e_data" "f_data" "e_meta"

During object constructing, the molecular formula is calculated from the elemental columns:

tail(picr$e_meta)
##              Mass  C  H  O N C13 S P     Error NeutralMass     MolForm
## 24437 897.1796269  0  0  0 0   0 0 0 0.0000000    898.1869        <NA>
## 24438 897.2209292  0  0  0 0   0 0 0 0.0000000    898.2282        <NA>
## 24439 897.3973977 36 69 22 1   0 0 1 0.2345417    898.4047 C36H69NO22P
## 24440  898.812526  0  0  0 0   0 0 0 0.0000000    899.8198        <NA>
## 24441 899.0458907  0  0  0 0   0 0 0 0.0000000    900.0532        <NA>
## 24442 899.3370941  0  0  0 0   0 0 0 0.0000000    900.3444        <NA>

Summary method

summary(picr)
## Samples: 20
## Molecules: 23060
## Percent Missing: 81.739%

Plot method

plot(picr)

Preprocessing

Transforming abundance values

  • When dealing with ’omics data quantitatively, we often log-transform to stabilize variances and reduce skew for downstream statistics.

  • It’s common to treat FTICR data as presence/absence data

picr <- edata_transform(picr, data_scale="log2")

Or replace 0s with NAs

edata_replace(picr, x = 0, y = NA)

Plot transformed data

plot(picr)

Calculating meta-data values

NOSC, aromaticity, O:C and H:C ratios, etc.

picr <- compound_calcs(picr)
picr
## peakIcrData object
## # Peaks: 23060
## # Samples: 20
## Meta data columns: [Mass, C, H, O, N, C13, S, P, Error, NeutralMass, MolForm, AI, AI_Mod, DBE, DBE_O, DBE_AI, GFE, kmass, kdefect, NOSC, OtoC_ratio, HtoC_ratio, NtoC_ratio, PtoC_ratio, NtoP_ratio]

Assigning classes

bs1 - Kim, S., et al (2003). Analytical Chemistry.
bs2 - Bailey, V. et al (2017). Soil Biology and Biochemistry.
bs3 - Rivas-Ubach, A., et al (2018). Analytical chemistry. – coming soon

picr <- assign_class(picr, boundary_set = "bs1")
table(picr$e_meta$Class)
## < table of extent 0 >

Filtering

There are multiple types of filtering algorithms provided in fticRanalysis:

  • Molecule filter: filter rows of e_data to exclude rows observed in too few samples
  • Mass filter: filter rows based on mass, e.g. to reflect observational sensitivity of the instrument
  • Formula filter: filter rows based on whether they have a molecular formula
  • Emeta filter: filter rows of e_data based on a quantity/column in e_meta

Filtering example

filter_obj <- mass_filter(picr)
plot(filter_obj, min_mass=200, max_mass=900)
summary(picr)
## Samples: 20
## Molecules: 23060
## Percent Missing: 81.739%
picr <- applyFilt(filter_obj, picr, min_mass = 200, 
                  max_mass = 900)
summary(picr)
## Samples: 20
## Molecules: 19327
## Percent Missing: 79.299%

Other filter types

Other filtering options include number of molecule observations, formula presence or absence, or emeta columns.

picr <- applyFilt(molecule_filter(picr), picr, min_num=2)
picr <- applyFilt(formula_filter(picr), picr)
picr <- applyFilt(emeta_filter(picr, "NOSC"), picr, min_val = 0.5)
summary(picr)
## Samples: 20
## Molecules: 1521
## Percent Missing: 50.352%

Visualizations of one sample

Subsetting to one sample

one_sample <- subset(picr, samples="EM0011_sample")
summary(one_sample)
## Samples: 1
## Molecules: 1521
## Percent Missing: 57.791%
head(one_sample$e_data)
##             Mass EM0011_sample
## 3746 200.9433045            NA
## 3748 200.9863723            NA
## 3774 202.9413892            NA
## 3844 209.0091827      20.32056
## 3892 212.0200788      20.82018
## 3909  213.004142            NA

Van Krevelen plot

A Van Krevelen plot shows H:C ratio vs O:C ratio for each peak observed in a sample that has a mass formula (thus H:C and O:C are known). By default, the points are colored according to functional class.

Van Krevelen plot example

vanKrevelenPlot(one_sample, title="EM0011_sample")

Customizing a Van Krevelen plot

By default, this function colors by Van Krevelen class. However, we can also color the points according to other meta data columns in the e_meta object.

vanKrevelenPlot(one_sample, colorCName="PtoC_ratio", 
                title="Color by P:C Ratio", legendTitle = "P:C Ratio")

Density plot

We can also plot the distributions of any (numeric) column of meta-data (i.e. column of e_meta).

densityPlot(one_sample, variable = "NOSC", plot_curve=TRUE, plot_hist=TRUE,
            title="NOSC Distribution for EM0011_sample")

Histogram

It’s also possible to plot just the histogram or just the density curve with this function with the plot_hist and plot_curve parameters.

densityPlot(one_sample, variable = "kmass", 
            title="Kendrick Mass for EM0011_sample", plot_hist=TRUE, 
            plot_curve = FALSE, yaxis="count")

Kendrick plot

A Kendrick plot shows Kendrick Defect vs Kendrick mass for each observed peak.

Ions of the same family have the same Kendrick mass defect and are positioned along a horizontal line on the plot. Kendrick plot is often used in conjunction with a Van Krevelen plot for evaluating elemental composition.

kendrickPlot(one_sample, title="Kendrick Plot for EM0011_sample")

Analysis of experimental groups

Experiment goal

The goal of this experiment was to identify differences in soil organic matter between sample locations and crop types.

In order to do that we need to compare experimental treatments (groups).

How to define groups

The group_designation method defines treatment groups based on the variable(s) specified as main effects.

picr <- group_designation(picr, main_effects=c("Crop.Flora"))
getGroupDF(picr)
##         SampleID Group
## 1  EM0011_sample     S
## 2  EM0013_sample     S
## 3  EM0015_sample     S
## 4  EM0017_sample     S
## 5  EM0019_sample     S
## 6  EM0061_sample     C
## 7  EM0063_sample     C
## 8  EM0065_sample     C
## 9  EM0067_sample     C
## 10 EM0069_sample     C
## 11 EW0111_sample     C
## 12 EW0113_sample     C
## 13 EW0115_sample     C
## 14 EW0117_sample     C
## 15 EW0119_sample     C
## 16 EW0161_sample     S
## 17 EW0163_sample     S
## 18 EW0165_sample     S
## 19 EW0167_sample     S
## 20 EW0169_sample     S

Summarizing groups

The summarizeGroups function calculates group-level summaries per peak, such as the number or proportion of samples in which peak is observed. The resulting object’s e_data element contains one column per group, per summary function.

group_summary <- summarizeGroups(picr, summary_functions = 
                                   c("n_present", "prop_present"))
head(group_summary$e_data)
##          Mass S_n_present S_prop_present C_n_present C_prop_present
## 1 200.9433045           3            0.3           4            0.4
## 2 200.9863723           2            0.2           0            0.0
## 3 202.9413892           1            0.1           1            0.1
## 4 209.0091827          10            1.0           7            0.7
## 5 212.0200788           9            0.9           9            0.9
## 6  213.004142           2            0.2           3            0.3

Comparing distributions between two groups

densityPlot(picr, samples=FALSE, groups=c("S","C"), variable="NOSC", 
            title="Comparison of NOSC Between Crop Types") 

Constructing group comparison subsets

Create peakIcrData objects that each contain two groups to facilitate group comparisons

byGroup <- divideByGroupComparisons(picr, 
                                comparisons = "all")[[1]]$value

crop_unique <- summarizeGroupComparisons(byGroup, 
            summary_functions="uniqueness_gtest", 
            summary_function_params=list(
                  uniqueness_gtest=list(pres_fn="nsamps", 
                          pres_thresh=2, pvalue_thresh=0.05)))

tail(crop_unique$e_data)
##             Mass uniqueness_gtest
## 1516 653.0299655             <NA>
## 1517 657.0215477             <NA>
## 1518 658.1682292 Observed in Both
## 1519 658.1686302 Observed in Both
## 1520 700.9749846             <NA>
## 1521  719.098301             <NA>

Van Krevelen plot comparing groups

vanKrevelenPlot(crop_unique, colorCName = "uniqueness_gtest")

Integration with biological databases

  • Functionality to map peaks to compounds, reactions, and modules
  • Vignette coming soon!